Promoting performance and separation of concerns for data mining applications on the grid
نویسندگان
چکیده
Grid Computing brought the promise of making high-performance computing cheaper and more easily available than traditional supercomputing platforms. Such a promise was very well received by the data mining (DM) community, as DM applications typically process very large datasets and are thus very resource intensive. However, since the Grid is very dynamic and parallel data mining is prone to load unbalancing, obtaining good data mining performance on the Grid is hard. It typically requires for the scheduler to understand the inner works of the application, bringing two related problems. First, good Grid schedulers tend to be very specialized in the application they target. Second, changing the application may require changing the scheduler, what may be specially challenging when there is no clear separation between the application and the scheduler code. We here propose and evaluate a knowledge-based approach that provides abstractions to the DM developer and optimizes at runtime the DM application on the Grid.
منابع مشابه
Application of multifractal modeling for separation of sulfidic mineralized zones based on induced polarization and resistivity data in the Ghare-Tappeh Cu deposit, NW Iran
The aim of this study was to identify various sulfidic mineralized zones in the Ghare-Tappeh Cu deposit (NW Iran) based on geo-electrical data including induced polarization (IP) and resistivity (RS) using the concentration-volume (C-V) and number-size (N-S) fractal models. The fractal models were used to separate high and moderate sulfidic zones from low sulfidic zones and barren wall rocks. B...
متن کاملA New High Frequency Grid Impedance Estimation Technique for the Frequency Range of 2 to150 kHz
Grid impedance estimation is used in many power system applications such as grid connected renewable energy systems and power quality analysis of smart grids. The grid impedance estimation techniques based on signal injection uses Ohm’s law for the estimation. In these methods, one or several signal(s) is (are) injected to Point of Common Coupling (PCC). Then the current through and voltage of ...
متن کاملE2DR: Energy Efficient Data Replication in Data Grid
Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...
متن کاملGrid Impedance Estimation Using Several Short-Term Low Power Signal Injections
In this paper, a signal processing method is proposed to estimate the low and high-frequency impedances of power systems using several short-term low power signal injections for a frequency range of 0-150 kHz. This frequency range is very important, and thusso it is considered in the analysis of power quality issues of smart grids. The impedance estimation is used in many power system applicati...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Future Generation Comp. Syst.
دوره 23 شماره
صفحات -
تاریخ انتشار 2007